11 research outputs found

    Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding

    Full text link
    Learning long-term dependencies in extended temporal sequences requires credit assignment to events far back in the past. The most common method for training recurrent neural networks, back-propagation through time (BPTT), requires credit information to be propagated backwards through every single step of the forward computation, potentially over thousands or millions of time steps. This becomes computationally expensive or even infeasible when used with long sequences. Importantly, biological brains are unlikely to perform such detailed reverse replay over very long sequences of internal states (consider days, months, or years.) However, humans are often reminded of past memories or mental states which are associated with the current mental state. We consider the hypothesis that such memory associations between past and present could be used for credit assignment through arbitrarily long sequences, propagating the credit assigned to the current state to the associated past state. Based on this principle, we study a novel algorithm which only back-propagates through a few of these temporal skip connections, realized by a learned attention mechanism that associates current states with relevant past states. We demonstrate in experiments that our method matches or outperforms regular BPTT and truncated BPTT in tasks involving particularly long-term dependencies, but without requiring the biologically implausible backward replay through the whole history of states. Additionally, we demonstrate that the proposed method transfers to longer sequences significantly better than LSTMs trained with BPTT and LSTMs trained with full self-attention.Comment: To appear as a Spotlight presentation at NIPS 201

    Deep Complex Networks

    Full text link
    At present, the vast majority of building blocks, techniques, and architectures for deep learning are based on real-valued operations and representations. However, recent work on recurrent neural networks and older fundamental theoretical analysis suggests that complex numbers could have a richer representational capacity and could also facilitate noise-robust memory retrieval mechanisms. Despite their attractive properties and potential for opening up entirely new neural architectures, complex-valued deep neural networks have been marginalized due to the absence of the building blocks required to design such models. In this work, we provide the key atomic components for complex-valued deep neural networks and apply them to convolutional feed-forward networks and convolutional LSTMs. More precisely, we rely on complex convolutions and present algorithms for complex batch-normalization, complex weight initialization strategies for complex-valued neural nets and we use them in experiments with end-to-end training schemes. We demonstrate that such complex-valued models are competitive with their real-valued counterparts. We test deep complex models on several computer vision tasks, on music transcription using the MusicNet dataset and on Speech Spectrum Prediction using the TIMIT dataset. We achieve state-of-the-art performance on these audio-related tasks

    Applications of complex numbers to deep neural networks

    Get PDF
    Dans la dernière décennie, une heureuse confluence de matériel, de logiciels et de théorie ont permis à l'intelligence artificielle de connaître un renouveau: un "printemps" et qui, contrairement au passé, semblent avoir mené non pas à la déception d'un autre "hiver", mais à un "été" durable, rempli de réelles avances. Une de ces récentes avances est l'entrée en scène de l'apprentissage véritablement ``profond''. Dans maintes applications les architectes de réseaux de neurones ont connu du succès en les approfondissant, et plus personne ne doute de l'utilité de représentations profondes, composées, hiérarchiques, apprises automatiquement à base d'exemples. Mais il existe d'autres avenues, moins explorées, qui pourraient être utiles, comme l'emploi d'alternatives au système numérique le plus commun, les nombres réels: nombres à basse précision, nombres complexe, quaternions. En 2017, moi-même et l'un de mes principaux collaborateurs discutâmes du manque d'intérêt accordé au traitement en nombres complexes et à l'analyse de signaux complexes ou aisément convertis en une série de nombres complexes grâce à la transformée de Fourier (1D, 2D, à court terme ou non). Puisque ce secteur semblait peu exploré, nous nous y sommes lancés et, au terme d'une année passée à relever des défis propres à l'architecture et l'initialisation d'un réseau de neurones n'employant que des nombres complexes, nous avons débouché sur des résultats prometteurs en vision informatique et en traitement de musique. Nous déjouons aussi les pièges d'une initialisation et d'une normalisation naïve de ce type de réseau de neurones avec des procédures adaptées.In the past decade, a convergence of hardware, software and theory have allowed artificial intelligence to experience a renewal: a "spring" that, unlike previous times, seems to have led not to a burst hype bubble and a new "AI winter", but to a lasting "summer", anchored by tangible advances in the field. One of the key such advances is truly ``deep'' learning. In many applications, the architects of neural networks have had great success by deepening them, and there is now little doubt about the value of deep, composable, hierarchical, automatically-learned-from-examples representations. But there exist other, less-well-explored avenues for research, such as alternatives to the real-valued number system most commonly used: low-precision, complex, quaternions. In 2017, myself and one of my primary collaborators discussed the seeming lack of interest given to purely complex-valued processing of digital signals, either directly available in complex form or convertible to such using e.g. the Fourier Transform (1D, 2D, short-time or not). Since this area seemed under-explored, we threw ourselves into it and, after a year spent dealing with the challenges of neural networks with purely complex-valued internal representations, we obtained good results in computer vision and music spectrum prediction. We also expose the pitfalls of naively initializing and normalizing such complex-valued networks and solve them with custom formulations adapted for the use of complex numbers
    corecore